{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "62cb51d5",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Lesson 6: Code extension by adding a new segment in the ELF"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "504eeea5",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "**Objectives**: add a new segment in an ELF; use the OFRAK PatchMaker to convert a C patch into a binary patch, including a linking step"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33855fd0",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Section Overview\n",
    "In this section, we will replace a call to `puts` with a wrapper function that converts all lower-case characters to upper-case in the target x86 ELF `hello_world`. We will explore what happens when and why patches can be applied in various ways and then learn about OFRAK PatchMaker in order to implement the objective."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c82e4268",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "### Target binary `hello_world`\n",
    "from ofrak_tutorial.helper_functions import create_hello_world_binary\n",
    "\n",
    "# Hello world source code for reference\n",
    "HELLO_WORLD_SOURCE = r\"\"\"\n",
    "#include <stdio.h>\n",
    "int main() {\n",
    "   // Can you patch `puts` to capitalize all lower-case characters in a provided string?\n",
    "   puts(\"Hello, World!\\n\");\n",
    "   return 0;\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "create_hello_world_binary()\n",
    "\n",
    "### Patch source `uppercase_and_print`\n",
    "c_patch = r\"\"\"\n",
    "extern int puts(char *str); // This is the address of what Ghidra calls 'puts@GLIBC'\n",
    "\n",
    "void uppercase_and_print(char *text)\n",
    "{\n",
    "    // We can't modify the string directly since it's stored in a RO segment\n",
    "    // so we need to copy it and provide the modified string to puts.\n",
    "\n",
    "    char str[15] = {0};\n",
    "    for(int i=0; i<14; i++){\n",
    "        if(text[i] >= 0x61 && text[i] <= 0x7A)\n",
    "            str[i] = text[i]-0x20;\n",
    "        else\n",
    "            str[i] = text[i];\n",
    "    }\n",
    "    puts(str);\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "c_patch_filename = \"c_patch.c\"\n",
    "with open(c_patch_filename, \"w\") as f:\n",
    "    f.write(c_patch)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02b3a874",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "We shall allocate in the free-space of `hello_world` the wrapper function `uppercase_and_print`, which is called instead of `puts` in `main`. `uppercase_and_print` will copy the string from RO memory, modify it, then call the library function `extern int puts(char *str)` with the modified string buffer.\n",
    "\n",
    "The final result of the patch, once disassembled with Ghidra, should look like this:\n",
    "```\n",
    "                             **************************************************************\n",
    "                             *                          FUNCTION                          *\n",
    "                             **************************************************************\n",
    "                             undefined main()\n",
    "             undefined         AL:1           <RETURN>\n",
    "                             main                                            XREF[5]:     Entry Point(*),\n",
    "                                                                                          _start:0040105d(*),\n",
    "                                                                                          _start:0040105d(*), 00402038,\n",
    "                                                                                          004020d8(*)\n",
    "        00401122 55              PUSH       RBP\n",
    "        00401123 48 89 e5        MOV        RBP,RSP\n",
    "        00401126 48 8d 3d        LEA        RDI,[s_Hello,_World!_00402004]                   = \"Hello, World!\"\n",
    "                 d7 0e 00 00\n",
    "        0040112d e8 ce 3e        CALL       FUN_00405000                                     undefined FUN_00405000()\n",
    "                 00 00\n",
    "        00401132 b8 00 00        MOV        EAX,0x0\n",
    "                 00 00\n",
    "        00401137 5d              POP        RBP\n",
    "        00401138 c3              RET\n",
    "\n",
    "```\n",
    "\n",
    "In both the original and patched binaries, the address of \"Hello, World!\\n\" is loaded from the read-only segment containing `.rodata`. In the patched binary, the call to `puts` has been replaced with a call to a function Ghidra identifies as `FUN_00405000`. `FUN_00405000` is our compiled function `uppercase_and_print`, which when disassembled looks like this:\n",
    "\n",
    "```\n",
    "                             //\n",
    "                             // segment_10\n",
    "                             // Loadable segment  [0x405000 - 0x406fff] (disabled execute\n",
    "                             // ram:00405000-ram:00406fff\n",
    "                             //\n",
    "                             **************************************************************\n",
    "                             *                          FUNCTION                          *\n",
    "                             **************************************************************\n",
    "                             undefined FUN_00405000()\n",
    "             undefined         AL:1           <RETURN>\n",
    "             undefined8        Stack[-0x9]:8  local_9                                 XREF[1]:     00405004(RW)\n",
    "             undefined1        Stack[-0x10]:1 local_10                                XREF[2]:     0040500a(RW),\n",
    "                                                                                                   00405036(*)\n",
    "                             FUN_00405000                                    XREF[2]:     00400280(*), main:0040112d(c)\n",
    "        00405000 48 83 ec 18     SUB        RSP,0x18\n",
    "        00405004 48 83 64        AND        qword ptr [RSP + local_9],0x0\n",
    "                 24 0f 00\n",
    "        0040500a 48 83 64        AND        qword ptr [RSP + local_10],0x0\n",
    "                 24 08 00\n",
    "        00405010 6a f2           PUSH       -0xe\n",
    "        00405012 58              POP        RAX\n",
    "                             LAB_00405013                                    XREF[1]:     00405034(j)\n",
    "        00405013 48 85 c0        TEST       RAX,RAX\n",
    "        00405016 74 1e           JZ         LAB_00405036\n",
    "        00405018 0f b6 4c        MOVZX      ECX,byte ptr [RDI + RAX*0x1 + 0xe]\n",
    "                 07 0e\n",
    "        0040501d 8d 51 9f        LEA        EDX,[RCX + -0x61]\n",
    "        00405020 8d 71 e0        LEA        ESI,[RCX + -0x20]\n",
    "        00405023 80 fa 1a        CMP        DL,0x1a\n",
    "        00405026 40 0f b6 d6     MOVZX      EDX,SIL\n",
    "        0040502a 0f 43 d1        CMOVNC     EDX,ECX\n",
    "        0040502d 88 54 04 16     MOV        byte ptr [RSP + RAX*0x1 + 0x16],DL\n",
    "        00405031 48 ff c0        INC        RAX\n",
    "        00405034 eb dd           JMP        LAB_00405013\n",
    "                             LAB_00405036                                    XREF[1]:     00405016(j)\n",
    "        00405036 48 8d 7c        LEA        RDI=>local_10,[RSP + 0x8]\n",
    "                 24 08\n",
    "        0040503b e8 f0 bf        CALL       <EXTERNAL>::puts                                 int puts(char * __s)\n",
    "                 ff ff\n",
    "        00405040 48 83 c4 18     ADD        RSP,0x18\n",
    "        00405044 c3              RET\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a25bf6e2",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Strategies of Patching\n",
    "#### Function Wrapping\n",
    "**Wrapping** is relatively straightforward as it typically only requires allocating the new code in free space and changing the destination address of a function call to point to that new code. This is the strategy taken for this tutorial section.\n",
    "\n",
    "Normally wrapping introduces an extra indirection to the call-chain for the function being wrapped. The original `puts` we're replacing is already a thunk that itself jumps to the extern function `puts@GLIBC`; since the `puts` function we're replacing branches to `puts@GLIBC` anyway, one may erroneously assume that this particular case doesn't introduce additional indirection.\n",
    "\n",
    "But in our patch we implement a /call/ to the thunk `puts` from within `uppercase_and_print`, whereas the original `puts` thunk /jumps/ to `puts@GLIBC` as part of a tail-call, skipping stack frame allocation operations that a call would imply.\n",
    "```\n",
    "Original / unaltered `hello_world` x86 ELF `puts` callgraph\n",
    "\n",
    " ________  CALL   ________  JUMP ______________\n",
    "|  main  |  -->  |  puts  |  +  |  puts@GLIBC  |\n",
    "|________|       |________|     |______________|\n",
    "      ^_________________________________|\n",
    "        TAIL-RET TO PC AFTER `puts` CALL\n",
    "```\n",
    "```\n",
    "Patched `hello_world` x86 ELF `puts` callgraph\n",
    "\n",
    " ________  CALL   _______________________  CALL   ________   JUMP ______________\n",
    "|  main  |  -->  |  uppercase_and_print  |  -->  |  puts  |   +  |  puts@GLIBC  |\n",
    "|________|       |___aka FUN_00405000____|       |________|      |______________|\n",
    "      ^__________________ ^ _____|                                       |\n",
    "      :                   |______________________________________________|\n",
    "      :                            TAIL-RET TO PC AFTER `puts` CALL\n",
    "      :\n",
    "      RET TO PC AFTER `uppercase_and_print` CALL\n",
    "```\n",
    "In our case, it isn't guaranteed that the argument provided to `uppercase_and_print` would be writeable, so we have to copy it first to writeable memory, capitalize all the characters, and call `puts@GLIBC` with a pointer to the modified buffer. Because we are allocating a new buffer, we may as well allocate an extra stack frame for it between the `main` parent and the `puts@GLIBC` call; any other solution would require relocating the string or making the segment read-writeable.\n",
    "\n",
    "#### Function replacement\n",
    "Alternatively the calling function could be **replaced** to capitalize all characters in the string before passing it to `puts`. In our case, we'd replace `main` with our patch. This would require deciding on how much behavior to replicate of the original caller into the patched caller, but it also allows for us to deal with more complex transformations and non-composable modifications on programs, and keeping these modifications inlined.\n",
    "\n",
    "Similar to how we do it for function wrapping, the patched caller is allocated into free-space, then the binary containing that patch is re-linked to use the patched caller in place of the original.\n",
    "\n",
    "Since we know the string that we're replacing, we can allocate it from RW memory and capitalize its letters in-place instead.\n",
    "\n",
    "```\n",
    "Caller-replaced `hello_world` x86 ELF `puts` callgraph\n",
    "\n",
    "int alternative_main() {\n",
    "    // Initialize the string in modifyable memory\n",
    "    char text[] = { \"Hello, World!\\n\" };\n",
    "\n",
    "    // Capitalize in-place\n",
    "    for (int i = 0; i < sizeof text; i++) {\n",
    "        if(text[i] >= 0x61 && text[i] <= 0x7A)\n",
    "            text[i] = text[i] - 0x20;\n",
    "    }\n",
    "\n",
    "    // Print\n",
    "    puts(text);\n",
    "\n",
    "    return 0;\n",
    "}\n",
    "\n",
    " ____________________  CALL   ________  JUMP ______________\n",
    "|  alternative_main  |  -->  |  puts  |  +  |  puts@GLIBC  |\n",
    "|____________________|       |________|     |______________|\n",
    "                  ^_________________________________|\n",
    "                    TAIL-RET TO PC AFTER `puts` CALL\n",
    "```\n",
    "\n",
    "#### Data-only patching\n",
    "Generally if the data is provided in RO memory, we could just replace the string with a capitalized variant, instead of modifying the program to capitalize the string at runtime. [Lesson 1](./1_simple_string_modification.ipynb) addresses data-only patching. For this lesson we will perform a code patch for demonstration purposes.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d28a672",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Considering other places to patch\n",
    "Could we hook something else to capitalize strings without copying from read-only memory? We could look for an existing function that already performs an array copy to writeable memory, and replace it with a modified version that also capitalizes each character as they are written to the destination. Such a function does not exist in our `hello_world` example as the string is directly passed to `puts` from RO memory. However,  we may consider patching GLIBC instead. If you expand the call-chain for GLIBC `puts` then it should look something like this:\n",
    "\n",
    "```\n",
    "Printing a string to STDOUT, from `main` to the `write` syscall:\n",
    "\n",
    " ________         ________         ______________         ____________         _______________________\n",
    "|  main  |  -->  |  puts  |   +   |  puts@GLIBC  |  -->  |  _IO_puts  |  -->  |  _IO_new_file_xsputn  |\n",
    "|________|       |________|       |______________|       |____________|       |_______________________|\n",
    "                                                                                   CALL |   CALL |\n",
    "                                                                                       \\|/       |\n",
    "                                                                                     ___`___     |\n",
    "                                                                                   [ mempcpy ]   |\n",
    "                                                                                                 |\n",
    "                                                                                                \\|/\n",
    "                              _________________         ______________________        ______`_________\n",
    "                             |  syscall write  |  <--  |  _IO_new_file_write  |  <-- |  new_do_write  |\n",
    "                             |_________________|       |______________________|      |________________|\n",
    "\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "It looks like [_IO_new_file_xsputn](https://elixir.bootlin.com/glibc/glibc-2.29/source/libio/fileops.c#L1180) [line 1243](https://elixir.bootlin.com/glibc/glibc-2.29/source/libio/fileops.c#L1243) is a macro expanded into a memcpy. One could replace the [mempcpy](https://elixir.bootlin.com/glibc/glibc-2.36/source/sysdeps/i386/i686/mempcpy.S#L37) call with something that performs a `BYTE != 0x20 ? BYTE & ~ 0x20 : BYTE` operation before writing each byte to the destination. The modified `mempcpy` would be embedded in the free-space of the target binary and be called in `__mempcpy`'s place on [line 1243](https://elixir.bootlin.com/glibc/glibc-2.29/source/libio/fileops.c#L1243).\n",
    "\n",
    "While this patch would mitigate the introduction of an extra indirection by replacing (not wrapping) an existing `memcpy`, there are two major drawbacks to this strategy: additional machinery would need to be implemented to prevent the modified `mempcpy` from modifying bytes un-intended for STDOUT, and this patch depends on specific versions of GLIBC to be loaded into the target binary's runtime. This patch seems more complicated to implement properly and is less portable due to its dependence on system-provided runtime libraries; nonetheless, whether it is suitable is dependent on the constraints a patch author is working with."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97bc40d0",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "**Overall**, a patch author has to wear two hats simultaneously: a patch author needs to apply the mindset of an engineer to decide on designs that fit some set of constraints, while also applying the mindset of a scientist to model behavior when modifications are made (and possibly on a binary without available source code). Certain patches may favor modularity and portability while other patch designs may favor performance, resulting in less portable and lower-level optimizations being applied; and in either case there may be requirements on timing and state guarantees that the modification has to satisfy."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c45bb82a",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Using OFRAK PatchMaker\n",
    "\n",
    "In order to apply this patch we'll need to correctly compile and link the wrapper function `uppercase_and_print` into the `hello_world` ELF binary. We'll use OFRAK's *PatchMaker* to facilitate building and injecting the patch.\n",
    "\n",
    "We can break this task down into 6 steps:\n",
    "\n",
    "1. **Unpack** `hello_world` onto an OFRAK resource tree;\n",
    "2. **Extend** `hello_world` with a new ELF segment to hold the code for `uppercase_and_print`, by applying OFRAK _LiefAddSegmentModifier_ to the resource tree;\n",
    "3. **Patch** the call instruction to `puts` (in main) with `uppercase_and_print` using OFRAK core; then finally,\n",
    "4. **Compile** the `uppercase_and_print` patch source using OFRAK _PatchMaker_, after defining the program attributes, toolchain and symbols we wish to re-link to `hello_world`;\n",
    "5. **Inject** the extended ELF segment with the compiled patch blob using OFRAK _BinaryPatchModifier_;\n",
    "6. **Pack** the OFRAK resource we've modified back into an executable, so we can run and test our work!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84425592",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "\n",
    "#### [1] Unpack `hello_world` onto an OFRAK resource tree\n",
    "\n",
    "First lets instantiate OFRAK with the Ghidra analyzer backend and create a root resource from the `hello_world` binary we want to patch:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8e378012",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using OFRAK Community License.\n"
     ]
    }
   ],
   "source": [
    "import ofrak_ghidra\n",
    "from ofrak import OFRAK\n",
    "\n",
    "ofrak = OFRAK()\n",
    "ofrak.discover(ofrak_ghidra)\n",
    "binary_analysis_context = await ofrak.create_ofrak_context()\n",
    "\n",
    "root_resource = await binary_analysis_context.create_root_resource_from_file(\"hello_world\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47a8f988",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### [2] Extend the `hello_world` resource with a new ELF segment with OFRAK core\n",
    "\n",
    "With the root resource created, let's start with the creation of the new segment. We will use OFRAK's `LiefAddSegmentModifier`, leveraging the [LIEF project](https://lief-project.github.io/) to add an ELF segment which will contain the code for `uppercase_and_print`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5d2912b0",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "nbval-ignore-output"
    ]
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "openjdk version \"11.0.23\" 2024-04-16\n",
      "OpenJDK Runtime Environment (build 11.0.23+9-post-Debian-1deb11u1)\n",
      "OpenJDK 64-Bit Server VM (build 11.0.23+9-post-Debian-1deb11u1, mixed mode)\n",
      "openjdk version \"11.0.23\" 2024-04-16\n",
      "OpenJDK Runtime Environment (build 11.0.23+9-post-Debian-1deb11u1)\n",
      "OpenJDK 64-Bit Server VM (build 11.0.23+9-post-Debian-1deb11u1, mixed mode)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "new segment: ElfProgramHeader(segment_index=10, p_type=1, p_offset=16384, p_vaddr=4214784, p_paddr=4214784, p_filesz=8192, p_memsz=8192, p_flags=5, p_align=4096)\n"
     ]
    }
   ],
   "source": [
    "from ofrak import ResourceFilter\n",
    "from ofrak.core import LiefAddSegmentConfig, LiefAddSegmentModifier, ElfProgramHeader\n",
    "\n",
    "PAGE_ALIGN = 0x1000\n",
    "\n",
    "\n",
    "async def add_and_return_segment(elf_resource, vaddr, size):\n",
    "    \"\"\"Add a segment to `elf_resource`, of size `size` at virtual address `vaddr`,\n",
    "    returning this new segment resource after unpacking.\"\"\"\n",
    "\n",
    "    config = LiefAddSegmentConfig(vaddr, PAGE_ALIGN, [0 for _ in range(size)], \"rx\")\n",
    "    await elf_resource.run(LiefAddSegmentModifier, config)\n",
    "\n",
    "    await elf_resource.unpack_recursively()\n",
    "\n",
    "    # Get our newly added segment. First get all ElfProgramHeaders, then return the one\n",
    "    # with our virtual address.\n",
    "    file_segments = await elf_resource.get_descendants_as_view(\n",
    "        ElfProgramHeader, r_filter=ResourceFilter(tags=(ElfProgramHeader,))\n",
    "    )\n",
    "    return [seg for seg in file_segments if seg.p_vaddr == vaddr].pop()\n",
    "\n",
    "\n",
    "new_segment = await add_and_return_segment(root_resource, 0x405000, 0x2000)\n",
    "\n",
    "print(f\"new segment: {new_segment}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7da12d55",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### [3] Patch the call instruction to `puts` (in main) with `uppercase_and_print` using OFRAK core\n",
    "\n",
    "Let's also get the instruction patch to call `uppercase_and_print` (which will start where our new segment starts) instead of `puts` out of the way:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3421c337",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from ofrak.core import ComplexBlock, Instruction\n",
    "from ofrak import ResourceAttributeValueFilter\n",
    "\n",
    "\n",
    "async def call_new_segment_instead(root_resource, new_segment):\n",
    "    \"\"\"Replace the original `call` instruction in main with a call to the start of `new_segment`.\"\"\"\n",
    "    main_cb = await root_resource.get_only_descendant_as_view(\n",
    "        v_type=ComplexBlock,\n",
    "        r_filter=ResourceFilter(\n",
    "            attribute_filters=(ResourceAttributeValueFilter(ComplexBlock.Symbol, \"main\"),)\n",
    "        ),\n",
    "    )\n",
    "    call_instruction = await main_cb.resource.get_only_descendant_as_view(\n",
    "        v_type=Instruction,\n",
    "        r_filter=ResourceFilter(\n",
    "            attribute_filters=(ResourceAttributeValueFilter(Instruction.Mnemonic, \"call\"),)\n",
    "        ),\n",
    "    )\n",
    "    await call_instruction.modify_assembly(\"call\", f\"0x{new_segment.p_vaddr:x}\")\n",
    "\n",
    "\n",
    "await call_new_segment_instead(root_resource, new_segment)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8429106",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### [4] **Compile** the `uppercase_and_print` patch source using _PatchMaker_\n",
    "\n",
    "The previous steps we've prepared `hello_world` to fit `uppercase_and_print` by modifying its structure in the resource tree. In this step we shall finally use _PatchMaker_ to compile the patch, which we shall break down into the following sub-steps:\n",
    "\n",
    "- Define the `ProgramAttributes` dataclass, which specifies the target CPU, ISA, etc.;\n",
    "- Define the `ToolchainConfig` dataclass, which specifies toolchain configuration parameters, such as optimization flags, reloc flags, etc.;\n",
    "- Initialize `PatchMaker` with the toolchain variant to use (LLVM, GCC, vbcc and version) and the symbol addresses to re-link, which in the case for `hello_world` is `puts`;\n",
    "- Define the `BOM` (Batch of Objects and Metadata) dataclass to include the source file with the `uppercase_and_print` patch;\n",
    "- Define the `PatchRegionConfig` dataclass describing the object files of the patch; then finally,\n",
    "- Compile the patch into a `FEM` (Final Executable and Metadata) object\n",
    "\n",
    "We can then use the FEM to extract the segment containing `uppercase_and_print` and re-import the contents of the patched segment into the OFRAK tree."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a48ef84f",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import tempfile\n",
    "import os\n",
    "\n",
    "bld_dir = tempfile.mkdtemp()\n",
    "c_patch_filename = \"c_patch.c\"\n",
    "exec_path = os.path.join(bld_dir, \"hello_world_patch_exec\")\n",
    "input_filename = \"hello_world\"\n",
    "output_filename = \"HELLO_WORLD\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18bf7d03",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "\n",
    "`hello_world` is a 64-bit x86 ELF binary compiled with GCC, however in this example we will compile the patch with LLVM. It works, but mixing objects compiled by different toolchains and compiler flags could increase the chances of un-predictable failures in other examples. Usually the most predictable results (and failures) are achieved by matching the toolchain configuration of the host binary, which can be inferred through some reverse engineering (fingerprinting, strings, debug info).\n",
    "\n",
    "Let's create the `ProgramAttributes` with information we know about the target binary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "946a040a",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from ofrak.core import ProgramAttributes\n",
    "from ofrak_type.architecture import InstructionSet\n",
    "from ofrak_type.bit_width import BitWidth\n",
    "from ofrak_type.endianness import Endianness\n",
    "\n",
    "proc = ProgramAttributes(\n",
    "    isa=InstructionSet.X86,\n",
    "    sub_isa=None,\n",
    "    bit_width=BitWidth.BIT_64,\n",
    "    endianness=Endianness.BIG_ENDIAN,\n",
    "    processor=None,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ac00fde",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "In supporting multiple toolchains, `ToolchainConfig` specifies generic compiler features (optimization level, force inline functions, etc.) which _PatchMaker_ maps into specific compiler flags provided to each toolchain. A full description can be found in the docstring for `ToolchainConfig` in `ofrak_patch_maker/ofrak_patch_maker/toolchain/model.py`. Let's compile our patch with `-OS` optimization, function inlining, as well as reduce extraneous scaffolding that may be added by the compiler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7202288c",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from ofrak_patch_maker.toolchain.model import (\n",
    "    ToolchainConfig,\n",
    "    BinFileType,\n",
    "    CompilerOptimizationLevel,\n",
    ")\n",
    "\n",
    "tc_config = ToolchainConfig(\n",
    "    file_format=BinFileType.ELF,\n",
    "    force_inlines=True,\n",
    "    relocatable=False,\n",
    "    no_std_lib=True,\n",
    "    no_jump_tables=True,\n",
    "    no_bss_section=True,\n",
    "    create_map_files=True,\n",
    "    compiler_optimization_level=CompilerOptimizationLevel.SPACE,\n",
    "    debug_info=False,\n",
    "    check_overlap=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f598d14",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Having specified `ProgramAttributes` and `ToolchainConfig`, we can set up the `PatchMaker` to link `extern int puts(char *str)` in our patch to the original address of `puts` in our target binary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "aa7e00b2",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Once the BOM is created, `extern int puts(char *str)` will point to 0x401030, the address of the `puts` thunk in `hello_world` which can be accessed from within the patch source code.\n"
     ]
    }
   ],
   "source": [
    "import logging\n",
    "\n",
    "from ofrak_patch_maker.patch_maker import PatchMaker\n",
    "from ofrak_patch_maker.toolchain.llvm_12 import LLVM_12_0_1_Toolchain\n",
    "\n",
    "# Get the complex block containing the code for `puts`\n",
    "puts_cb = await root_resource.get_only_descendant_as_view(\n",
    "    v_type=ComplexBlock,\n",
    "    r_filter=ResourceFilter(\n",
    "        attribute_filters=(ResourceAttributeValueFilter(ComplexBlock.Symbol, \"puts\"),)\n",
    "    ),\n",
    ")\n",
    "\n",
    "# Initialize the PatchMaker. This is where we tell it that our `_puts` will\n",
    "# need to be linked to the address of the existing `puts`.\n",
    "toolchain = LLVM_12_0_1_Toolchain(proc, tc_config)\n",
    "patch_maker = PatchMaker(\n",
    "    toolchain=toolchain,\n",
    "    logger=logging.getLogger(\"ToolchainTest\"),\n",
    "    build_dir=bld_dir,\n",
    "    base_symbols={\n",
    "        \"puts\": puts_cb.virtual_address,\n",
    "    },\n",
    ")\n",
    "\n",
    "print(\n",
    "    f\"Once the BOM is created, `extern int puts(char *str)` will point to 0x{puts_cb.virtual_address:x}, the address of the `puts` thunk in `hello_world` which can be accessed from within the patch source code.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbd883fe",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "_PatchMaker_ can build a patch from multiple source, object and header files, so we ask it to generate a `BOM` (Batched Objects and Metadata):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9fde02fe",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BOM dataclass:\n",
      "name:                          hello_world_patch\n",
      "bss_size_required:             None\n",
      "entry_point_symbol (optional): None\n",
      "\n",
      "object map[c_patch]:           AssembledObject(path='/tmp/tmp10mi8gpg/hello_world_patch_bom_files/c_patch.c.o', file_format=<BinFileType.ELF: 'elf'>, segment_map=immutabledict({'.text': Segment(segment_name='.text', vm_address=0, offset=64, is_entry=False, length=70, access_perms=<MemoryPermissions.RX: 5>, is_allocated=True, is_bss=False, alignment=4)}), strong_symbols=immutabledict({'c_patch.c': (0, <LinkableSymbolType.UNDEF: -1>), 'uppercase_and_print': (0, <LinkableSymbolType.FUNC: 0>)}), unresolved_symbols=immutabledict({'puts': (0, <LinkableSymbolType.UNDEF: -1>)}), bss_size_required=None)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# The BOM basically corresponds to the step of building the object files, before linking,\n",
    "# and gives us more fine-grained control of the build step if we wish to.\n",
    "bom = patch_maker.make_bom(\n",
    "    name=\"hello_world_patch\",\n",
    "    source_list=[c_patch_filename],\n",
    "    object_list=[],\n",
    "    header_dirs=[],\n",
    ")\n",
    "\n",
    "print(\n",
    "    f\"BOM dataclass:\\n\"\n",
    "    f\"name:                          {bom.name}\\n\"\n",
    "    f\"bss_size_required:             {bom.bss_size_required}\\n\"\n",
    "    f\"entry_point_symbol (optional): {bom.entry_point_symbol}\\n\\n\"\n",
    "    f\"object map[c_patch]:           {bom.object_map['c_patch.c']}\\n\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb1f9f9f",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Then we generate the `PatchRegionConfig`, which describes the segment we've extended in step [3] and maps it to the `uppercase_and_print` code blob that we are about to compile:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "635e0eed",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from ofrak_patch_maker.model import PatchRegionConfig\n",
    "from ofrak_patch_maker.toolchain.model import Segment\n",
    "from ofrak_type.memory_permissions import MemoryPermissions\n",
    "\n",
    "# Tell the PatchMaker about the segment we added in the binary...\n",
    "text_segment_uppercase = Segment(\n",
    "    segment_name=\".text\",\n",
    "    vm_address=new_segment.p_vaddr,\n",
    "    offset=0,\n",
    "    is_entry=False,\n",
    "    length=new_segment.p_filesz,\n",
    "    access_perms=MemoryPermissions.RX,\n",
    ")\n",
    "\n",
    "# ... And that we want to put the compiled C patch there.\n",
    "uppercase_object = bom.object_map[c_patch_filename]\n",
    "segment_dict = {\n",
    "    uppercase_object.path: (text_segment_uppercase,),\n",
    "}\n",
    "\n",
    "# Generate a PatchRegionConfig incorporating the previous information\n",
    "p = PatchRegionConfig(bom.name + \"_patch\", segment_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0fdf9a21",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "And finally for this step, we compile the patch into an FEM (Final Executable and Metadata) and write it to disk:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "80577a17",
   "metadata": {
    "pycharm": {
     "is_executing": true,
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ld.lld: warning: cannot find entry symbol _start; not setting start address\n"
     ]
    }
   ],
   "source": [
    "from ofrak_patch_maker.toolchain.utils import get_file_format\n",
    "\n",
    "fem = patch_maker.make_fem([(bom, p)], exec_path)\n",
    "\n",
    "assert os.path.exists(exec_path)\n",
    "assert get_file_format(exec_path) == tc_config.file_format"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2b983b8",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### [5] Inject the extended ELF segment with the compiled patch blob using OFRAK _BinaryPatchModifier_\n",
    "Having compiled the patch, we shall extract `uppercase_and_print` from the FEM into the extended segment we created in step [3] on the OFRAK resource tree:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "65177471",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OFRAK tree patched!\n"
     ]
    }
   ],
   "source": [
    "from ofrak.core import BinaryPatchModifier, BinaryPatchConfig\n",
    "\n",
    "# At this point, the PatchMaker has produced an executable containing our new segment.\n",
    "# Let's read it, find the binary data of the segment, and finally patch that binary\n",
    "# data into our resource.\n",
    "with open(fem.executable.path, \"rb\") as f:\n",
    "    exe_data = f.read()\n",
    "\n",
    "# Retrieve the binary data of our new segment\n",
    "segment_data = b\"\"\n",
    "for segment in fem.executable.segments:\n",
    "    if segment.length == 0 or segment.vm_address == 0:\n",
    "        continue\n",
    "    segment_data = exe_data[segment.offset : segment.offset + segment.length]\n",
    "    break\n",
    "assert len(segment_data) != 0\n",
    "\n",
    "# Patch the compiled code in the new_segment\n",
    "patch_config = BinaryPatchConfig(new_segment.p_offset, segment_data)\n",
    "await root_resource.run(BinaryPatchModifier, patch_config)\n",
    "\n",
    "print(\"OFRAK tree patched!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d70af45f",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### [6] Pack the OFRAK resource we've modified back into an executable, so we can run and test our work!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "8147b30e",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "await root_resource.pack()\n",
    "await root_resource.flush_data_to_disk(output_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d25df90",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "Let's test our results!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "cadf4555",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original: Hello, World!\n",
      "Modified: HELLO, WORLD!\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "\n",
    "printf \"Original: \"\n",
    "./hello_world\n",
    "\n",
    "printf \"Modified: \"\n",
    "chmod +x ./HELLO_WORLD && ./HELLO_WORLD"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4a636e7",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "If you can see the capitalized \"HELLO, WORLD!\" output, the patch has successfully been applied. Feel free to experiment with the patch source and toolchain configuration in this notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "948822f5",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "[Next page](7_conclusion.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.18"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
